Fuzzy State Aggregation and Policy Hill Climbing for Stochastic Environments
نویسندگان
چکیده
Received (received date) Revised (revised date) Reinforcement learning is one of the more attractive machine learning technologies, due to its unsupervised learning structure and ability to continually learn even as the operating environment changes. Additionally, by applying reinforcement learning to multiple cooperative software agents (a multi-agent system) not only allows each individual agent to learn from its own experience, but also opens up the opportunity for the individual agents to learn from the other agents in the system, thus accelerating the rate of learning. This research presents the novel use of fuzzy state aggregation, as the means of function approximation, combined with the fastest policy hill climbing methods of Win or Lose Fast (WoLF) and policy-dynamics based WoLF (PD-WoLF). The combination of fast policy hill climbing and fuzzy state aggregation function approximation is tested in two stochastic environments; Tileworld and the simulated robot soccer domain, RoboCup. The Tileworld results demonstrate that a single agent using the combination of FSA and PHC learns quicker and performs better than combined fuzzy state aggregation and Q-learning reinforcement learning alone. Results from the multi-agent RoboCup domain again illustrate that the policy hill climbing algorithms perform better than Q-learning alone in a multi-agent environment. The learning is further enhanced by allowing the agents to share their experience through weighted strategy sharing.
منابع مشابه
Fuzzy State Aggregation and Off-policy Reinforcement Learning for Stochastic Environments
Reinforcement learning is one of the more attractive machine learning technologies, due to its unsupervised learning structure and ability to continually learn even as the environment it is operating in changes. This ability to learn in an unsupervised manner in a changing environment is applicable in complex domains through the use of function approximation of the domain’s policy. The function...
متن کاملAn Improved MPPT Method of Wind Turbine Based on HCS Method by Using Fuzzy Logic System
In this paper presents a Maximum Power Point Tracking (MPPT) technique based on the Hill Climbing Search (HCS) method and fuzzy logic system for Wind Turbines (WTs) including of Permanent Magnet Synchronous Generator (PMSG) as generator. In the conventional HCS method the step size is constant, therefor both steady-state response and dynamic response of method cannot provide at the same time an...
متن کاملRational Learning of Mixed Equilibria
This paper investigates the problem of policy learning in multi-agent environments using the stochastic game framework, which we brieey overview. We introduce two properties as desirable for a learning agent when in the presence of other learning agents, namely rationality and convergence. We examine existing reinforcement learning algorithms according to these two properties and notice that th...
متن کاملStochastic Enforced Hill-Climbing
Enforced hill-climbing is an effective deterministic hillclimbing technique that deals with local optima using breadth-first search (a process called “basin flooding”). We propose and evaluate a stochastic generalization of enforced hill-climbing for online use in goal-oriented probabilistic planning problems. We assume a provided heuristic function estimating expected cost to the goal with fla...
متن کاملDirect Policy Search and Uncertain Policy Evaluation
Reinforcement learning based on direct search in policy space requires few assumptions about the environment. Hence it is applicable in certain situations where most traditional reinforcement learning algorithms based on dynamic programming are not, especially in partially observable, deterministic worlds. In realistic settings, however, reliable policy evaluations are complicated by numerous s...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- International Journal of Computational Intelligence and Applications
دوره 6 شماره
صفحات -
تاریخ انتشار 2006